Goto

Collaborating Authors

 spreading metric


Reviews: Hierarchical Clustering via Spreading Metrics

Neural Information Processing Systems

The paper of Dasgupta introduced a nice objective for hierarchical clustering, and presented a fairly simple algorithm for approximating the objective. I like this objective since it gives a nice principled way of comparing different algorithms (potentially new ones) for hierarchical clustering, as opposed to just showing bounds for linkage-based heuristics. I think the algorithm and analysis in this paper is very elegant. Every hierarchical clustering corresponds to an ultrametric; but they characterize the ultrametrics that correspond to the objective defined by Dasgupta, using some very nice properties. They use this characterization to come up with an LP and its rounding.


Hierarchical Clustering via Spreading Metrics

Neural Information Processing Systems

We study the cost function for hierarchical clusterings introduced by [Dasgupta, 2015] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [Dasgupta, 2015] that a top-down algorithm returns a hierarchical clustering of cost at most \(O\left(\alpha_n \log n\right)\) times the cost of the optimal hierarchical clustering, where \(\alpha_n\) is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top down algorithm returns a hierarchical clustering of cost at most \(O\left(\log {3/2} n\right)\) times the cost of the optimal solution. We improve this by giving an \(O(\log{n})\)-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of \emph{sphere growing} which has been extensively used in the context of graph partitioning.